AITopics | pixel-level annotation

Collaborating Authors

pixel-level annotation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

Neural Information Processing SystemsDec-24-2025, 06:26:20 GMT

We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short-and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos.

epic-kitchen visor benchmark, name change, video segmentation and object relation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.57)

Add feedback

590a7ebe0da1f262c80d0188f5c4c222-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsAug-15-2025, 00:47:07 GMT

annotation, computer vision, computer vision and pattern recognition, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Bristol (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TagGAN: A Generative Model for Data Tagging

Nawaz, Muhammad, Nasir, Basma, Zia, Tehseen, Hussain, Zawar, Moreira, Catarina

arXiv.org Artificial IntelligenceFeb-24-2025

Precise identification and localization of disease-specific features at the pixel-level are particularly important for early diagnosis, disease progression monitoring, and effective treatment in medical image analysis. However, conventional diagnostic AI systems lack decision transparency and cannot operate well in environments where there is a lack of pixel-level annotations. In this study, we propose a novel Generative Adversarial Networks (GANs)-based framework, TagGAN, which is tailored for weakly-supervised fine-grained disease map generation from purely image-level labeled data. TagGAN generates a pixel-level disease map during domain translation from an abnormal image to a normal representation. Later, this map is subtracted from the input abnormal image to convert it into its normal counterpart while preserving all the critical anatomical details. Our method is first to generate fine-grained disease maps to visualize disease lesions in a weekly supervised setting without requiring pixel-level annotations. This development enhances the interpretability of diagnostic AI by providing precise visualizations of disease-specific regions. It also introduces automated binary mask generation to assist radiologists. Empirical evaluations carried out on the benchmark datasets, CheXpert, TBX11K, and COVID-19, demonstrate the capability of TagGAN to outperform current top models in accurately identifying disease-specific pixels. This outcome highlights the capability of the proposed model to tag medical images, significantly reducing the workload for radiologists by eliminating the need for binary masks during training.

annotation, binary mask, disease map, (15 more...)

arXiv.org Artificial Intelligence

2502.17836

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Italy (0.04)
Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)
Africa > Rwanda (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Weakly Supervised Pixel-Level Annotation with Visual Interpretability

Nasir, Basma, Zia, Tehseen, Nawaz, Muhammad, Moreira, Catarina

arXiv.org Artificial IntelligenceFeb-24-2025

Medical image annotation is essential for diagnosing diseases, yet manual annotation is time-consuming, costly, and prone to variability among experts. To address these challenges, we propose an automated explainable annotation system that integrates ensemble learning, visual explainability, and uncertainty quantification. Our approach combines three pre-trained deep learning models - ResNet50, EfficientNet, and DenseNet - enhanced with XGrad-CAM for visual explanations and Monte Carlo Dropout for uncertainty quantification. This ensemble mimics the consensus of multiple radiologists by intersecting saliency maps from models that agree on the diagnosis while uncertain predictions are flagged for human review. We evaluated our system using the TBX11K medical imaging dataset and a Fire segmentation dataset, demonstrating its robustness across different domains. Experimental results show that our method outperforms baseline models, achieving 93.04% accuracy on TBX11K and 96.4% accuracy on the Fire dataset. Moreover, our model produces precise pixel-level annotations despite being trained with only image-level labels, achieving Intersection over Union IoU scores of 36.07% and 64.7%, respectively. By enhancing the accuracy and interpretability of image annotations, our approach offers a reliable and transparent solution for medical diagnostics and other image analysis tasks.

annotation, pixel-level annotation, prediction, (13 more...)

arXiv.org Artificial Intelligence

2502.17824

Country:

North America > United States (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

Neural Information Processing SystemsOct-11-2024, 04:10:02 GMT

We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publicly release 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos.

epic-kitchen visor benchmark, pixel-level annotation, video segmentation and object relation, (1 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Charge > Source > Source Maturation (1.00)

Technology: Information Technology > Artificial Intelligence > Vision (0.62)

Add feedback

Knowledge Extraction and Distillation from Large-Scale Image-Text Colonoscopy Records Leveraging Large Language and Vision Models

Wang, Shuo, Zhu, Yan, Luo, Xiaoyuan, Yang, Zhiwei, Zhang, Yizhe, Fu, Peiyao, Wang, Manning, Song, Zhijian, Li, Quanlin, Zhou, Pinghong, Guo, Yike

arXiv.org Artificial IntelligenceOct-17-2023

The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalisation. Image-text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, though annotating them is labour-intensive. Here we leverage recent advancements in large language and vision models and propose EndoKED, a data mining paradigm for deep knowledge extraction and distillation. EndoKED automates the transformation of raw colonoscopy records into image datasets with pixel-level annotation. We validate EndoKED using multi-centre datasets of raw colonoscopy records (~1 million images), demonstrating its superior performance in training polyp detection and segmentation models. Furthermore, the EndoKED pre-trained vision backbone enables data-efficient and generalisable learning for optical biopsy, achieving expert-level performance in both retrospective and prospective validation.

annotation, colonoscopy record, dataset, (13 more...)

arXiv.org Artificial Intelligence

2310.11173

Country:

Asia > China > Shanghai > Shanghai (0.05)
Europe > Austria > Vienna (0.04)
Asia > China > Henan Province > Zhengzhou (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

Zhang, Hantao, Guo, Weidong, Qiu, Chenyang, Wan, Shouhong, Zou, Bingbing, Wang, Wanqin, Jin, Peiquan

arXiv.org Artificial IntelligenceAug-16-2023

Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in performing differential diagnosis of rectal cancer. Additionally, a major obstacle is the lack of a large-scale, finely annotated CT image dataset for rectal cancer segmentation. To address these issues, this work introduces a novel large scale rectal cancer CT image dataset CARE with pixel-level annotations for both normal and cancerous rectum, which serves as a valuable resource for algorithm research and clinical application development. Moreover, we propose a novel medical cancer lesion segmentation benchmark model named U-SAM. The model is specifically designed to tackle the challenges posed by the intricate anatomical structures of abdominal organs by incorporating prompt information. U-SAM contains three key components: promptable information (e.g., points) to aid in target area localization, a convolution module for capturing low-level lesion details, and skip-connections to preserve and recover spatial information during the encoding-decoding process. To evaluate the effectiveness of U-SAM, we systematically compare its performance with several popular segmentation methods on the CARE dataset. The generalization of the model is further verified on the WORD dataset. Extensive experiments demonstrate that the proposed U-SAM outperforms state-of-the-art methods on these two datasets. These experiments can serve as the baseline for future research and clinical application development.

artificial intelligence, machine learning, segmentation, (16 more...)

arXiv.org Artificial Intelligence

2308.08283

Country: Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Colorectal Cancer (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

From slides (through tiles) to pixels: an explainability framework for weakly supervised models in pre-clinical pathology

Bertolini, Marco, Le, Van-Khoa, Pencharz, Jake, Poehlmann, Andreas, Clevert, Djork-Arné, Villalba, Santiago, Montanari, Floriane

arXiv.org Artificial IntelligenceFeb-3-2023

In pre-clinical pathology, there is a paradox between the abundance of raw data (whole slide images from many organs of many individual animals) and the lack of pixel-level slide annotations done by pathologists. Due to time constraints and requirements from regulatory authorities, diagnoses are instead stored as slide labels. Weakly supervised training is designed to take advantage of those data, and the trained models can be used by pathologists to rank slides by their probability of containing a given lesion of interest. In this work, we propose a novel contextualized eXplainable AI (XAI) framework and its application to deep learning models trained on Whole Slide Images (WSIs) in Digital Pathology. Specifically, we apply our methods to a multi-instance-learning (MIL) model, which is trained solely on slide-level labels, without the need for pixel-level annotations. We validate quantitatively our methods by quantifying the agreements of our explanations' heatmaps with pathologists' annotations, as well as with predictions from a segmentation model trained on such annotations. We demonstrate the stability of the explanations with respect to input shifts, and the fidelity with respect to increased model performance. We quantitatively evaluate the correlation between available pixel-wise annotations and explainability heatmaps. We show that the explanations on important tiles of the whole slide correlate with tissue changes between healthy regions and lesions, but do not exactly behave like a human annotator. This result is coherent with the model training strategy.

artificial intelligence, explanation, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2302.01653

Country:

North America > United States (0.14)
Europe > Germany > Berlin (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > France (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Meta Mask Correction for Nuclei Segmentation in Histopathological Image

Shi, Jiangbo, Jia, Chang, Gao, Zeyu, Gong, Tieliang, Wang, Chunbao, Li, Chen

arXiv.org Artificial IntelligenceNov-24-2021

Nuclei segmentation is a fundamental task in digital pathology analysis and can be automated by deep learning-based methods. However, the development of such an automated method requires a large amount of data with precisely annotated masks which is hard to obtain. Training with weakly labeled data is a popular solution for reducing the workload of annotation. In this paper, we propose a novel meta-learning-based nuclei segmentation method which follows the label correction paradigm to leverage data with noisy masks. Specifically, we design a fully conventional meta-model that can correct noisy masks using a small amount of clean meta-data. Then the corrected masks can be used to supervise the training of the segmentation model. Meanwhile, a bi-level optimization method is adopted to alternately update the parameters of the main segmentation model and the meta-model in an end-to-end way. Extensive experimental results on two nuclear segmentation datasets show that our method achieves the state-of-the-art result. It even achieves comparable performance with the model training on supervised data in some noisy settings.

annotation, segmentation, segmentation model, (13 more...)

arXiv.org Artificial Intelligence

2111.12498

Country: Asia > China > Shaanxi Province > Xi'an (0.06)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Transferable Semi-Supervised Semantic Segmentation

Xiao, Huaxin (National University of Defense Technology) | Wei, Yunchao (Beckman Institute, University of Illinois at Urbana-Champaign) | Liu, Yu (National University of Defense Technology) | Zhang, Maojun (National University of Defense Technology) | Feng, Jiashi (National University of Singapore)

AAAI ConferencesFeb-8-2018

The performance of deep learning based semantic segmentation models heavily depends on sufficient data with careful annotations. However, even the largest public datasets only provide samples with pixel-level annotations for rather limited semantic categories. Such data scarcity critically limits scalability and applicability of semantic segmentation models in real applications. In this paper, we propose a novel transferable semi-supervised semantic segmentation model that can transfer the learned segmentation knowledge from a few strong categories with pixel-level annotations to unseen weak categories with only image-level annotations, significantly broadening the applicable territory of deep segmentation models. In particular, the proposed model consists of two complementary and learnable components: a Label transfer Network (L-Net) and a Prediction transfer Network (P-Net). The L-Net learns to transfer the segmentation knowledge from strong categories to the images in the weak categories and produces coarse pixel-level semantic maps, by effectively exploiting the similar appearance shared across categories. Meanwhile, the P-Net tailors the transferred knowledge through a carefully designed adversarial learning strategy and produces refined segmentation results with better details. Integrating the L-Net and P-Net achieves 96.5% and 89.4% performance of the fully-supervised baseline using 50% and 0% categories with pixel-level annotations respectively on PASCAL VOC 2012. With such a novel transfer mechanism, our proposed model is easily generalizable to a variety of new categories, only requiring image-level annotations, and offers appealing scalability in real applications.

artificial intelligence, category, machine learning, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback